AITopics

2604.03721

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany > Bremen > Bremen (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(11 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)

Neural Information Processing SystemsFeb-10-2026, 10:14:01 GMT

39e9c5913c970e3e49c2df629daff636-Paper-Conference.pdf

dataset, exp, representation, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Neural Information Processing SystemsFeb-8-2026, 04:05:42 GMT

3ce3bd7d63a2c9c81983cc8e9bd02ae5-AuthorFeedback.pdf

reppoint, revision, verification task, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.96)

Neural Information Processing SystemsDec-26-2025, 14:12:22 GMT

Semi-Supervised Contrastive Learning for Deep Regression with Ordinal Rankings from Spectral Seriation

Contrastive learning methods can be applied to deep regression by enforcing label distance relationships in feature space. However, these methods are limited to labeled data only unlike for classification, where unlabeled data can be used for contrastive pretraining. In this work, we extend contrastive regression methods to allow unlabeled data to be used in a semi-supervised setting, thereby reducing the reliance on manual annotations. We observe that the feature similarity matrix between unlabeled samples still reflect inter-sample relationships, and that an accurate ordinal relationship can be recovered through spectral seriation algorithms if the level of error is within certain bounds. By using the recovered ordinal relationship for contrastive learning on unlabeled samples, we can allow more data to be used for feature representation learning, thereby achieve more robust results. The ordinal rankings can also be used to supervise predictions on unlabeled samples, which can serve as an additional training signal. We provide theoretical guarantees and empirical support through experiments on different datasets, demonstrating that our method can surpass existing state-of-the-art semi-supervised deep regression methods. To the best of our knowledge, this work is the first to explore using unlabeled data to perform contrastive learning for regression.

deep regression, ordinal ranking, semi-supervised contrastive learning, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Nakamoto, Carter H., Chen, Lucia Lushi, Foryciarz, Agata, Rose, Sherri

Penalized Fair Regression for Multiple Groups in Chronic Kidney Disease

arXiv.org Machine LearningDec-22-2025

Fair regression methods have the potential to mitigate societal bias concerns in health care, but there has been little work on penalized fair regression when multiple groups experience such bias. We propose a general regression framework that addresses this gap with unfairness penalties for multiple groups. Our approach is demonstrated for binary outcomes with true positive rate disparity penalties. It can be efficiently implemented through reduction to a cost-sensitive classification problem. We additionally introduce novel score functions for automatically selecting penalty weights. Our penalized fair regression methods are empirically studied in simulations, where they achieve a fairness-accuracy frontier beyond that of existing comparison methods. Finally, we apply these methods to a national multi-site primary care study of chronic kidney disease to develop a fair classifier for end-stage renal disease. There we find substantial improvements in fairness for multiple race and ethnicity groups who experience societal bias in the health care system without any appreciable loss in overall fit.

estimator, penalty weight, regression, (14 more...)

2512.1734

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Alaska (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Neural Information Processing SystemsOct-10-2025, 23:11:51 GMT

Rank-N-Contrast: Learning Continuous Representations for Regression

C achieves state-of-the-art performance, highlighting its intriguing properties including better data efficiency, robustness to spurious targets and data corruptions, and generalization to distribution shifts.

dataset, exp, representation, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Neural Information Processing SystemsOct-2-2025, 19:07:42 GMT

Export Reviews, Discussions, Author Feedback and Meta-Reviews

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper proposes a new regression method, namely calibrated multivariate regression (CMR), for high dimensional data analysis. Besides proposing the CMR formulation, the paper focuses on (1) using a smoothed proximal gradient method to compute CMR's optimal solutions; (2) analyzing CMR' statical properties. One key contribution of the paper lies in the introduction of this CMR formulation; its loss term can be interpreted as calibrating each regression task's loss term with respect to its noise level. I am wondering whether there is any more intuitive interpretation behind the use of the noise level for calibration?

cmr, regularization parameter, variance, (13 more...)

Country: North America > Canada > Quebec > Montreal (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.76)

Métayer, Clémence, Ballesta, Annabelle, Martinelli, Julien

Data-driven Discovery of Digital Twins in Biomedical Research

arXiv.org Artificial IntelligenceSep-3-2025

Recent technological advances have expanded the availability of high-throughput biological datasets, enabling the reliable design of digital twins of biomedical systems or patients. Such computational tools represent key reaction networks driving perturbation or drug response and can guide drug discovery and personalized therapeutics. Yet, their development still relies on laborious data integration by the human modeler, so that automated approaches are critically needed. The success of data-driven system discovery in Physics, rooted in clean datasets and well-defined governing laws, has fueled interest in applying similar techniques in Biology, which presents unique challenges. Here, we reviewed methodologies for automatically inferring digital twins from biological time series, which mostly involve symbolic or sparse regression. We evaluate algorithms according to eight biological and methodological challenges, associated to noisy/incomplete data, multiple conditions, prior knowledge integration, latent variables, high dimensionality, unobserved variable derivatives, candidate library design, and uncertainty quantification. Upon these criteria, sparse regression generally outperformed symbolic regression, particularly when using Bayesian frameworks. We further highlight the emerging role of deep learning and large language models, which enable innovative prior knowledge integration, though the reliability and consistency of such approaches must be improved. While no single method addresses all challenges, we argue that progress in learning digital twins will come from hybrid and modular frameworks combining chemical reaction network-based mechanistic grounding, Bayesian uncertainty quantification, and the generative and knowledge integration capacities of deep learning. To support their development, we further propose a benchmarking framework to evaluate methods across all challenges.

large language model, machine learning, regression, (20 more...)

arXiv.org Artificial Intelligence

2508.21484

Country:

Europe (0.67)
North America > United States (0.46)

Genre:

Research Report (0.81)
Workflow (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(5 more...)

Sella, Vignesh, Pham, Julie, Willcox, Karen, Chaudhuri, Anirban

Projection-based multifidelity linear regression for data-scarce applications

arXiv.org Machine LearningAug-13-2025

An important challenge in scientific machine learning is to develop methods that can exploit and maximize the amount of learning possible from scarce data [1-4]. The need for such methods arises often in science and engineering, especially in the case of computational fluid dynamics (CFD), since expensive-to-evaluate high-fidelity (HF) models make many-query problems such as uncertainty quantification, risk analysis, optimization, and optimization under uncertainty computationally prohibitive [5]. Surrogate models that approximate the solutions to HF models can facilitate the design and analysis process; however, lack of sufficient HF data in tandem with high-dimensional quantities of interest adversely affect surrogate model accuracy. We propose multifidelity (MF) linear regression methods that leverage abundant low-cost, lower-fidelity (LF) data alongside limited HF data to construct linear regression models. These models operate within a reduced-dimensional subspace, obtained through the principal component analysis (PCA), to effectively handle both training data scarcity and the high dimensionality (on the order of tens of thousands of quantities of interest) inherent in our problem setting. Linear regression has been widely utilized as a surrogate modeling approach in aerospace applications due to its simplicity and interpretability. We note that linear regression encompasses a broad class of models that are linear in their parameters but can include features that are arbitrarily nonlinear functions of the input variables [6].

artificial intelligence, machine learning, regression, (17 more...)

2508.08517

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Aerospace & Defense (0.68)
Government > Regional Government > North America Government > United States Government (0.68)
Government > Military (0.68)
Transportation > Air (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Kurbucz, Marcell T., Tzivanakis, Nikolaos, Aslam, Nilufer Sari, Sykulski, Adam M.

SplitWise Regression: Stepwise Modeling with Adaptive Dummy Encoding

arXiv.org Machine LearningMay-22-2025

Capturing nonlinear relationships without sacrificing interpretability remains a persistent challenge in regression modeling. We introduce SplitWise, a novel framework that enhances stepwise regression. It adaptively transforms numeric predictors into threshold-based binary features using shallow decision trees, but only when such transformations improve model fit, as assessed by the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC). This approach preserves the transparency of linear models while flexibly capturing nonlinear effects. Implemented as a user-friendly R package, SplitWise is evaluated on both synthetic and real-world datasets. The results show that it consistently produces more parsimonious and generalizable models than traditional stepwise and penalized regression techniques.

artificial intelligence, dataset, machine learning, (16 more...)

2505.15423

Country:

Europe > United Kingdom (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Malaysia > Penang (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)